from IPython.display import HTML
HTML('''<button type="button" class="btn btn-outline-danger" onclick="codeToggle();">Toggle Code</button>''')
India is known for its diversity. Be it in cultures, climates, languages, or even food. A plethora of dishes are made to satisfy the hunger of 138 crore stomachs. Some of them are seasonal, some come and go at festivals, some of them are made from imported materials, some of them are exported, some are eaten by only a particular community, some are forbidden in some religions, and what we eat is just the tip of the iceberg compared to what the whole of India eats.
We decided to analyse the food trends in India. The data on which we built our analysis is not food consumption data; instead, it represents the popularity of a food item. Using Google Trends , we have the search frequency of multiple dishes and raw materials over the years of 2019 to 2021. Within an error margin, food search data is closely similar to food consumption data. Google Trends also gives the search frequency distributed over all states of India.
Our data needed quite a bit of pre-processing. Apparently, Google Trends only gives relative data, not absolute, which meant that all the food items' popularity was capped at 100. However, a very neat feature is that we're allowed to compare up to 5 items together. So we agreed on a 'baseline term', which will be searched along with individual food items. We can then download the csv files of the data, which has the (relative) amount of searches per week in our time range. These values are numerical, and we scaled each of them accordingly, so that the baseline term has the same values throughout. The baseline term that we have used is 'Drinking water'.
# Importing Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import rc
from matplotlib.animation import FuncAnimation
from matplotlib import animation
import plotly.express as px
import plotly.io as pio
pio.renderers.default = 'notebook'
from plotly.offline import plot, iplot, init_notebook_mode
init_notebook_mode(connected=True)
rc('animation', html='jshtml')
frn = 10 # Number of frames to process in the animation
fps = 0.5 # Frames per second
mywriter = animation.PillowWriter(fps=fps)
Here's what our data for mango looks like:
# Data Mango
mango_df = pd.read_csv("./data/mango.csv")
mango_df.head(10)
| Week | mango: (India) | |
|---|---|---|
| 0 | 2019-01-06 | 10 |
| 1 | 2019-01-13 | 10 |
| 2 | 2019-01-20 | 11 |
| 3 | 2019-01-27 | 10 |
| 4 | 2019-02-03 | 12 |
| 5 | 2019-02-10 | 12 |
| 6 | 2019-02-17 | 12 |
| 7 | 2019-02-24 | 11 |
| 8 | 2019-03-03 | 13 |
| 9 | 2019-03-10 | 14 |
And here's what it looks like on a timeline:
# Mango timeline
fig = px.line(mango_df, x='Week', y='mango: (India)', labels={
"Week": "Timeline",
"mango: (India)": "Mango"
}, height=400)
fig.update_layout(margin = {
"r": 50,
"b": 50,
"t": 50,
"l": 50
})
fig.update_layout(title = {
"text": "Interest Over Time",
"x": 0.5,
"y": 0.95
})
fig.show()
From this graph, we observe that the interest in mangoes is significantly higher in the months of May-June, which is the peak mango season.
Note: Each point in the plot shows the interest over a specific week
This data also shows if the food item was popular for a reason other than seasons and festivals. Some notable examples are shown below:
# Timeline
dragon_fruit_df = pd.read_csv("./data/dragon_fruit.csv")
fig = px.line(dragon_fruit_df, x='Week', y='dragon fruit: (India)', labels={
"Week": "Timeline",
"dragon fruit: (India)": "Dragon Fruit"
}, height=400)
fig.update_layout(margin = {
"r": 50,
"b": 50,
"t": 50,
"l": 50
})
fig.update_layout(title = {
"text": "Interest Over Time",
"x": 0.5,
"y": 0.95
})
fig.show()
Search data of dragon fruit in India shows a distinctive peak in January 2021. This peak was due to the renaming of dragon fruit to 'kamalam' by the Gujarat Government. The distinctive shape and colour of the dragon fruit makes it look similar to a lotus, which is called kamal in Hindi. This was statewide news, and we're guessing that this made people search for dragon fruit. ( source )
# Timeline
chocolate_df = pd.read_csv("./data/chocolate.csv")
fig = px.line(chocolate_df, x='Week', y='chocolate: (India)', labels={
"Week": "Timeline",
"chocolate: (India)": "Chocolate"
}, height=400)
fig.update_layout(margin = {
"r": 50,
"b": 50,
"t": 50,
"l": 50
})
fig.update_layout(title = {
"text": "Interest Over Time",
"x": 0.5,
"y": 0.95
})
fig.show()
This one is a very cute example of how festivals also have an impact. Chocolate peaks exclusively in the second week of February across all years. This is when Valentine's Day is celebrated, and is the most contributing factor to chocolate popularity in February.
# Timeline
dalgona_candy_df = pd.read_csv("./data/dalgona_candy.csv")
fig = px.line(dalgona_candy_df, x='Week', y='dalgona candy: (India)', labels={
"Week": "Timeline",
"dalgona candy: (India)": "Dalgona candy"
}, height=400)
fig.update_layout(margin = {
"r": 50,
"b": 50,
"t": 50,
"l": 50
})
fig.update_layout(title = {
"text": "Interest Over Time",
"x": 0.5,
"y": 0.95
})
fig.show()
Another interesting example is where we can see a spike in the searches for Dalgona Candy during September 2021 when the Korean drama series Squid Game was released. One of the challenges in it included cutting the Dalgona Candy without breaking it in a fixed time. So, it became famous after the series was a success.
# Timeline
undhiyu_df = pd.read_csv("./data/undhiyu.csv")
fig = px.line(undhiyu_df, x='Week', y='Undhiyu: (India)', labels={
"Week": "Timeline",
"Undhiyu: (India)": "Undhiyu"
}, height=400)
fig.update_layout(margin = {
"r": 50,
"b": 50,
"t": 50,
"l": 50
})
fig.update_layout(title = {
"text": "Interest Over Time",
"x": 0.5,
"y": 0.95
})
fig.show()
Undhiyu is a seasonal Gujrathi dish made in the winter. It is made of many winter vegetables and is being cooked in almost every household in Gujarat during Makarsankranti (January 14) which resembles the peaks in the graph.
# Timeline
haleem_df = pd.read_csv("./data/haleem.csv")
fig = px.line(haleem_df, x='Week', y='Haleem: (India)', labels={
"Week": "Timeline",
"Haleem: (India)": "Haleem"
}, height=400)
fig.update_layout(margin = {
"r": 50,
"b": 50,
"t": 50,
"l": 50
})
fig.update_layout(title = {
"text": "Interest Over Time",
"x": 0.5,
"y": 0.95
})
fig.show()
Haleem is a dish that is mainly prepared during the Ramadan month of the Muslim Hijri calendar in Hyderabad, Telangana in India. It has a significant cultural history among Muslims and is very famous in the Middle East, and the Indian subcontinent. As we can see in the plot above the spikes align with the Ramadan months of those years. Haleem originated from Harees which was introduced to the Hyderabad Nizam’s army by Arab soldiers, which over time changed to Haleem. Haleem from Hyderabad is transported all over the country.
A search term not completely unrelated to food, 'homemade' had a ginormous peak in April-May of 2020, when the first lockdown due to COVID was initiated:
# Timeline
homemade_df = pd.read_csv("./data/homemade.csv")
fig = px.line(homemade_df, x='Week', y='homemade: (India)', labels={
"Week": "Timeline",
"homemade: (India)": "Homemade"
}, height=400)
fig.update_layout(margin = {
"r": 50,
"b": 50,
"t": 50,
"l": 50
})
fig.update_layout(title = {
"text": "Interest Over Time",
"x": 0.5,
"y": 0.95
})
fig.show()
This corresponds, with much likelihood, to the fact that people were now looking for recipes of their favourite foods that they can make at home, since restaurants and hotels had to be closed down.
Similarly, we figured, with a worldwide pandemic affecting our daily lives, it must have also affected the food in our daily lives. With this data, we plan to also gain some insight as to how COVID affected food searches. Will it be more than usual or less? And why? We decided to make a line chart of some of the items in our dataset.
# Timeline
covid_slope_df = pd.read_csv("./data/covid_slope_chart.csv")
fig = px.line(covid_slope_df, x='Week', y=covid_slope_df.columns, labels={
"Week": "Timeline",
"homemade: (India)": "Homemade"
}, height=400)
fig.update_layout(margin = {
"r": 50,
"b": 50,
"t": 50,
"l": 50
})
fig.update_layout(title = {
"text": "Interest Over Time",
"x": 0.5,
"y": 0.95
})
fig.show()
In the first 1-2 months after the COVID outbreak in India, when the country was under lockdown, there was a spike in the number of searches for the recipes of many food items which aren’t typically made at home. People could not have food items like momos, jalebi, dhokla, panipuri, samosa, and cake from shops and restaurants. This made them interested in making them at home to satisfy their taste buds, and led to searching for the recipes of such foods. As seen in the above plot, all the food items have a spike in April 2020. Gradually, as local shops opened and people became negligent in the later COVID waves we can infer that people didn’t need to search for recipes online.
import json
india_geodata = json.load(open("data/states_india.geojson", "r"))
state_id_map = {}
for feature in india_geodata["features"]:
feature["id"] = feature["properties"]["state_code"]
state_id_map[feature["properties"]["st_nm"]] = feature["id"]
def generate_map (csv_name, title):
df = pd.read_csv("data/" + csv_name)
df["Interest"] = df["Interest"].fillna(0)
df.loc[df['State'] == "Delhi", 'State'] = "NCT of Delhi"
df.loc[df['State'] == "Dadra and Nagar Haveli", 'State'] = "Dadara & Nagar Havelli"
df.loc[df['State'] == "Andaman and Nicobar Islands", 'State'] = "Andaman & Nicobar Island"
df.loc[df['State'] == "Jammu and Kashmir", 'State'] = "Jammu & Kashmir"
df.loc[df['State'] == "Daman and Diu", 'State'] = "Daman & Diu"
df.loc[df['State'] == "Arunachal Pradesh", 'State'] = "Arunanchal Pradesh"
df["id"] = df["State"].apply(lambda x: state_id_map[x])
fig = px.choropleth(df, geojson=india_geodata, locations='id', color='Interest',
color_continuous_scale="Blues",
range_color=(0, 100),
hover_name="State",
hover_data=["Interest"],
title=title,
)
fig.update_geos(fitbounds="locations", visible=False)
fig.update_layout(margin={"r":5,"t":50,"l":5,"b":10})
return fig
fig = generate_map("undhiyu_geomap.csv", "State Wise Relative Trend: Undhiyu")
fig.show()
In the above plot, we can observe that Undhiyu is more popular in Gujarat than in other states of India, since it is a seasonal Gujarati dish.
fig = generate_map("haleem_geomap.csv", "State Wise Relative Trend: Haleem")
fig.show()
As mentioned in the Interest Over Time plot of Haleem, it is much famous in Hyderabad as compared to other regions. So, in the above plot, Telangana has the highest searches for Haleem.
fig = generate_map("misalpav_geomap.csv", "State Wise Relative Trend: Misalpav")
fig.show()
Misal Pav is a famous dish from Maharastra which is made of Misal and bread (Pav) similar to other famous dishes of Maharashtra like vada pav. In the above plot, we can observe that the Google searches for misal pav are concentrated in Maharashtra. Moreover, we see zero searches for misal pav among the north-eastern states, which gives us an idea that misal pav is a highly localized dish. It is safe to assume that extremely few people know about misal pav in such states.
fig = generate_map("pitha_geomap.csv", "State Wise Relative Trend: Pitha")
fig.show()
There are different varities of Pithas which are majorly famous in Odisha, Bihar, Jharkhand, West Bengal and the North-Eastern states. In Assam, for special occasions like Bihu, pitha is made and in West Bengal, there are special pithas made during the harvest festivals. In the above plot, we can observe the high popularity of Pitha in the mentioned states.
fig = generate_map("biryani_geomap.csv", "State Wise Relative Trend: Biryani")
fig.show()
In the above geoplot we can observe that the popularity of Biryani gradually decreases from southern states to northern states. This has both historical and cultural regions. South Indians are more used to rice diet, while the staple of North India is wheat. Biryani makes a very convenient and complete dish in South India, which is why it's popular there.
We would also like to show the power of language in different regions. We use the example of buttermilk to show this. The following is the state-wise interest of buttermilk all over India.
fig = generate_map("buttermilk_geomap.csv", "State Wise Relative Trend: ButterMilk")
fig.show()
We can see that it's fairly uniform. However, not all people use the English term to refer to buttermilk. We used the data for the 5 most popular words for buttermilk in different languages. It is called moru in Tamil and Malayalam, chaas in Gujarati, ghol in Rajasthani, majjiga in Telugu, and laban in Bengali. The region-wise interest for these term is show in the following geoplot:
df = pd.read_csv("data/buttermilk_multilanguage.csv")
items = ["Moru", "Chaas", "Ghol", "Majjiga", "Laban"]
for item in items:
df[item] = df[item].fillna("0%")
df.loc[df['State'] == "Delhi", 'State'] = "NCT of Delhi"
df.loc[df['State'] == "Dadra and Nagar Haveli", 'State'] = "Dadara & Nagar Havelli"
df.loc[df['State'] == "Andaman and Nicobar Islands", 'State'] = "Andaman & Nicobar Island"
df.loc[df['State'] == "Jammu and Kashmir", 'State'] = "Jammu & Kashmir"
df.loc[df['State'] == "Daman and Diu", 'State'] = "Daman & Diu"
df.loc[df['State'] == "Arunachal Pradesh", 'State'] = "Arunanchal Pradesh"
df["id"] = df["State"].apply(lambda x: state_id_map[x])
fig = px.choropleth(df, geojson=india_geodata, color="Most Popular",
locations="id",
projection="mercator", hover_data=["Moru", "Chaas", "Ghol", "Majjiga", "Laban"],
color_discrete_map={
"Chaas": "slateblue",
"Ghol": "violet",
"Majjiga": "orange",
"Moru": "tomato",
"Laban": "dodgerblue",
"None": "lightgray"
},
title="Most Popular Regional Name for Buttermilk"
)
fig.update_geos(fitbounds="locations", visible=False)
fig.update_layout(margin={"r":5,"t":50,"l":5,"b":10})
fig.show()
This data cannot be shown in the timeline. The timeline for regional buttermilk terms is fairly normal, as shown below:
# Buttermilk in different languages
buttermilk_df = pd.read_csv("./data/multi_buttermilk.csv")
fig = px.line(buttermilk_df, x='Week', y=buttermilk_df.columns, labels={
"Week": "Timeline",
"homemade: (India)": "Homemade"
}, height=400)
fig.update_layout(margin = {
"r": 50,
"b": 50,
"t": 50,
"l": 50
})
fig.update_layout(title = {
"text": "Interest Over Time",
"x": 0.5,
"y": 0.95
})
fig.show()
The diversity in India allows for such an interesting data analysis of food. It is so diverse that analysing everything is out of the scope of this assignment. However, we believe we covered all the highlights, even though it may be biased towards some states.